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Abstract 

Invariance to geometric transformations is a highly desirable property of automatic 
classifiers in many image recognition tasks. Nevertheless, it is unclear to which ex¬ 
tent state-of-the-art classifiers are invariant to basic transformations such as rotations 
and translations. This is mainly due to the lack of general methods that properly mea¬ 
sure such an invariance. In this paper, we propose a rigorous and systematic approach 
for quantifying the invariance to geometric transformations of any classifier. Our key 
idea is to cast the problem of assessing a classifier’s invariance as the computation of 
geodesics along the manifold of transformed images. We propose the Manifest method, 
built on the efficient Fast Marching algorithm to compute the invariance of classifiers. 

Our new method quantifies in particular the importance of data augmentation for learn¬ 
ing invariance from data, and the increased invariance of convolutional neural networks 
with depth. We foresee that the proposed generic tool for measuring invariance to a large 
class of geometric transformations and arbitrary classifiers will have many applications 
for evaluating and comparing classifiers based on their invariance, and help improving 
the invariance of existing classifiers. 

Introduction 

Due to the huge research efforts that have been recently deployed in computer vision and 
machine learning, the state-of-the-art image classification systems are now reaching perfor¬ 
mances that are close to those of the human visual system in terms of accuracy on some 
datasets [D3, E3]. Questions emerge to what differences remain between human visual sys¬ 
tem and state-of-the-art classifiers. We focus here on one key difference, namely the problem 
of invariance to geometric transformations. While the human visual system is invariant to 
some extent to geometric transformations, it is unclear whether automatic classifiers enjoy 
the same invariance properties. The importance of invariance in classifiers has been outlined 
in recent works PZ2, ED], and effective solutions for transformation-invariant classifications 
have been proposed by either adapting the classification rules with proper distance metrics 
[O, ED, IZ3, ED], or by improving the features used for classification [ffl, i, IZ3]. To validate 
such new design choices and to understand how to further improve classifiers’ invariance, it 
becomes however primordial to develop general methods to properly measure the robustness 
of classifiers to geometric transformations of data samples. Previous works have proposed 
methods to evaluate the invariance of classifiers, either by controlled changes in simple im¬ 
ages [□], or by specific tests for features of popular neural network architectures [□]. These 

©2015. The copyright of this document resides with its authors. 

It may be distributed unchanged freely in print or electronic forms. 


> 

U 

c/5 

o 


> 

in 

cn 

in 

so 

o 

i> 

o 

m 


a 





2 


FAWZI, FROSSARD: MANITEST: ARE CLASSIFIERS REALLY INVARIANT? 


previous studies are however limited, as they are restricted to one-dimensional transforma¬ 
tions (e.g., rotations only), to particular types of classifiers (e.g., neural networks) or to 
simple images (e.g., sinusoidal images), and are based on heuristically-driven quantities. 
Another approach for measuring invariance consists in generating datasets with transformed 
images, and measuring the accuracy of classifiers on these datasets [D32, EH, ED]. This is how¬ 
ever laborious and involves building a novel well-designed dataset to compare all classifiers 
on a common ground. 

In this paper, we propose a principled and systematic method to measure the robustness 
of arbitrary image classifiers to geometric transformations. In particular, we design a new 
framework that can be applied to any Lie group T and to any classifier / regardless of the 
particular nature of the classifier. For a given image, we define the invariance measure as 
the minimal distance between the identity transformation and a transformation in T that is 
sufficient to change the decision of the classifier / on that image. In order to define the 
transformation metric, our novel key idea is to represent the set of transformed versions of 
an image as a manifold; the transformation metric is then naturally captured by the geodesic 
distance on the manifold. Hence, for a given image, our invariance measure essentially 
corresponds to the minimal geodesic distance on the manifold that leads to a point where the 
classifier’s decision is changed. A global invariance measure is then derived by averaging 
over a sufficiently large sample set. Equipped with our generic definition of invariance, 
we leverage the techniques used in the analysis of manifolds of transformed visual patterns 
[0, 113, EE] and design the Manitest method built on the efficient Fast Marching algorithm 
[O, E3] to compute the invariance of classifiers. 

Using Manitest, we quantitatively show the following results: (i) The invariance of con¬ 
volutional neural networks and scattering transforms largely outperform SVM classifiers, 
(ii) Two classifiers can have a similar accuracy, but have different invariance scores, (iii) The 
invariance of convolutional neural networks improves with network depth, (iv) On natural 
images classification task, baseline convolutional networks are not invariant to slight com¬ 
binations of translations, rotations, and dilations (v) Data augmentation can dramatically 
increase the invariance of a classifier. The latter result is particularly surprising, as an SVM 
with RBF kernel trained on augmented samples can outperform the invariance of convolu¬ 
tional neural networks (without data augmentation) on a handwritten digits dataset. Besides 
these results, we showcase examples illustrating the introduced invariance scores. By pro¬ 
viding a systematic tool to assess the classifiers in terms of their robustness to geometric 
transformations, we bridge a gap towards understanding the invariance properties of differ¬ 
ent families of classifiers, which will hopefully lead to building new classifiers that perform 
closer to the human visual system. The code of Manitest is available on the project website 1 . 

2 Problem formulation 

2.1 Definitions 

We consider a mathematical model where images are represented as functions I : M 2 —>• M, 
and we denote by L 2 the space of square integrable images. Let T be a Lie group consisting 
of geometric transformations on M 2 , and we denote by p the dimension of T (i.e., number 
of free parameters). For any transformation T that belongs to T, we denote by I T the image I 
transformed by T. That is, / T (x,y) = /(T -1 (v,y)). Examples of Lie groups include the rota¬ 
tion group SO(2) (p = 1, described by one angle) and the similarity group (p = 4, described 

x http ://sites.google.com/site/invmanitest/ 
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by a 2D translation vector, a dilation and an angle). 

Consider an image classification task, where the images are assigned discrete labels in 
C = { 1 ,... ,L}, and let / be an arbitrary image classifier. Formally, / is a function defined 
on the space of square integrable images L 2 , and takes values in the set C. Our goal in this 
paper is to evaluate the invariance of / with respect to T. Given an image /, we define the 
invariance score of / relative to /, Aq -(/;/), to be the minimal normalized distance from the 
identity transformation to a transformation t that changes the classification label, i.e., 

Ar(/;/) = min subject to /(/ T ) ^ /(/), ( 1 ) 

Pllz.2 

where e is the identity element of the group T and d : T xT -A M + is a metric on T that 
we define later (Section 2.2). The invariance score quantifies the resilience of / to trans¬ 
formations in T, namely larger values of A 7 -(/;/) indicate a larger invariance. It is worth 
noting that our definition of Aj- is related to the recent work in [E3] that defined adversar¬ 
ial noise as the minimal perturbation (in the Euclidean sense) required to misclassify the 
datapoint. However, instead of considering generic adversarial perturbations, we focus on 
minimal geometric transformations , with a metric borrowed from the group T. 

For a given a distribution of datapoints /l, the global invariance score of / to transforma¬ 
tions in T is defined by 


Pr(/) = E/^ M A r (/;/). ( 2 ) 

The quantity Pr(/) depends on / as well as the distribution of datapoints /I. However, 
to simplify notations, we have omitted the dependence on p, assuming the distribution is 
clear from the context. In practical classification tasks, the true underlying distribution p is 
generally unknown. In that case, we estimate the global resilience by taking the empirical 
average 2 overtraining points: Pr(/) = f IJLi 

2.2 Transformation metric 

We discuss and introduce the distance used for the invariance score A 7 -(/;/). It should 
be noted that T is possibly a multi-dimensional group (i.e., the transformations in T are de¬ 
scribed by many parameters of different nature such as translation, rotation, scale,...); hence, 
defining a trivial metric that measures the absolute distance between transformation param¬ 
eters is of limited interest, as it combines parameters possibly of different nature. Instead, a 
more relevant notion of distance is one that depends on the underlying image I. In that case, 
d( Ti, T 2 ) quantifies the change in appearance between images / Tl and I T2 , rather than an ab¬ 
solute distance between the two transformations. Consider for example the image distance 
di(r i,t 2 ) = ||/ Tl -I X2 || L 2 . While di explicitly depends on the underlying image /, it fails to 
capture the intrinsic geometry of the family of transformed images. To illustrate this point, 
we consider a simple example of images in Fig. 1 with two transformed versions / Tl and I Tl 
of a reference image / To . Note that d/(To, Ti) = dj{ To, T 2 ), as both transformed objects have 
no intersection with the reference object. However, it is clear that I T2 incurred a large rotation 
and translation, while I Tl underwent a slight vertical translation. Hence, the distance metric 
should naturally satisfy d{ To,Ti) < d( To,T 2 ), which is not the case for the image distance. 
This is crucial in our setting, as a classifier that recognizes the similarity of the objects in I Tl 

2 In practice, it is sufficient to consider an empirical average over a sufficiently large random subset of the training 
set. The number of samples is chosen to achieve a small enough confidence interval. 
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Figure 1: Schematic representation of the problem encountered by using metric the L 2 met¬ 
ric. Black pixels indicate pixels with value 0, and I Zl ,I Z2 are obtained by applying a combi¬ 
nation of rotation and translation to I ZQ . Image taken from [□]. 
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Figure 2: Images along the geodesic path from I Zo to I Z2 

and I Zq is certainly more robust to transformations than a classifier that merely recognizes the 
similarity between I Zl and / To , and should be given a higher score. This example underlines a 
well-known fundamental issue with the L 2 distance that fails to capture the intrinsic distance 
of the curved manifold of transformed images (see e.g., [9, El]). To correctly capture the 
intrinsic structure of the manifold, we define d to be the length of the shortest path belonging 
to the manifold (i.e., the geodesic distance). For illustration, we show in Fig. 2 images along 
the geodesic path from To to T 2 ; the geodesic distance is then essentially the sum of local L 2 
distances between transformed images over the geodesic path. We formalize these notions 
as follows. 

Let M(I) be the family of transformed images M(I) = {I z : T G T}. Equipped with 
the L 2 metric, M{I) defines a metric space and a continuous submanifold of L 2 . Following 
the works of [HI, EB] that considered similar manifolds in different contexts, we call M(I) 
an Image Appearance Manifold (IAM), and we follow here their approach. Assuming that 
y: [0,1] \-y T is a C 1 curve in T, and that I y ^ is differentiable with respect to t, we define 
the length L(y) of y as 



Note that Eq. (3) is expressed in terms of the L 2 metric in the image appearance manifold 
and corresponds to summing the local L 2 distances between transformed images over the 
path ly. We now show that L(y) can be expressed as a length associated to a Riemannian 
metric on T that we now derive. Defining the map 

F i T —y AT, t 1 — y I Z: 


we have 

= °y) (0 = dF r{t) (Y (^)), 

where dF z denotes the differential of F at T, and / is derivative of y. It follows that 


L W = J 0 y/gy(t)(?(t),f(t))dt 
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where g z is the Riemannian metric (i.e., a positive bilinear form on T Z T , the tangent space 
of T at t), given by: 


g T (v,w) = (dF T (v) 1 dF T (w )) L 2 for all v, w G T t T. 

Note that g can be equivalently seen as the pullback of the L 2 metric on M(I) along F. By 
choosing a basis in the tangent space, the length L(y) can be equivalently written 


L ^ = l \/FtyF^Y(t)dt, 


where is the p x p positive definite matrix associated to the bilinear form g. 

Example 1 (Rotation, T = SO( 2 )) The transformation group T is parametrized with a ro¬ 
tation angle 0 (p = 1). In this case, the matix Gq is of size 1 by 1, and equal to 


Gq = 


dh 


d6 


□ 


G 


Example 2 (Dilation+Rotation). The group T has 2 degrees of freedom; namely a scale 
parameter a , and a rotation angle 0. The Riemannian metric reads 


G z = 


dlx Six 
da 5 da 
' dI T dlx 
, d6 5 / 


(9/x- (9/^ 
(9a 5 ^0 
^ 0/ T <9/ t 
. dQ 1 d6 , 


□ 


Having defined the length of a curve on T, the geodesic distance between two points 
Ti, T 2 is defined as the length of the shortest curve joining the two points: 

J(ti,t 2 ) =inf{L(y): yec 1 ([0,1]),7(0) = Ti,7(1) = t 2 }. 


Finally, our problem therefore consists in computing the global invariance score, or 
equivalently A 7 -(/;/) defined in Eq. (1), where d is the geodesic distance. In other words, 
our problem becomes that of computing the minimal geodesic distance from the identity 
transformation to a transformation that is sufficient to change the estimated label of /. 


3 Invariance score computation 


The key to an efficient and accurate approximation of A 7 -(/;/) lies in the effective computa¬ 
tion of geodesics on the manifold (T, G) that we address as follows. 

Let u( t) = d(e, t) be the geodesic map that measures the geodesic distance between the 
(fixed) identity element and T. The geodesic map satisfies the following Eikonal equation 

\m 

||Vw(t) || g -i = 1 for T G T\{e }, and u(e) = 0, (4) 

where ||jc||a = y/(x,x) A with {x,y) A = x r Ay. Moreover, it was proved in [□] that the geodesic 
map u is the unique viscosity solution of the Eikonal equation, provided that T -A G(t) is con¬ 
tinuous. Many numerical schemes rely on the Eikonal equation characterization to approx¬ 
imate the geodesic map. We use here the popular Fast Marching (FM) method [□], a fast 
front propagation approach that computes the values of the discrete geodesic map in increas¬ 
ing order. We only provide here a brief description of FM due to space constraints, and focus 
on the case where the manifold T is two-dimensional (i.e., p = 2). The extension to arbitrary 
dimensions is straightforward, and we refer to [ED, 123] for more complete explanations and 
computations. 
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Algorithm 1 Manitest method (with p = 2) for computing A p{I\f) 

Initialize U(e) = 0, U = otherwise, and tag all nodes as unknown. 
while termination criterion is not met do 

Select the unknown node T m i n that achieves minimal distance U. 

Tag T m i n as known. 

If /( 7 Tmin) ^ /( 7 )’ set A r (/;/) «- t/(T rain )/||/|| t 2 and terminate, 
for all unknown x G i n ) do 

Update U(x) to be the minimum of itself, U (T m i n ) +1| T — T m i n ||g t and the expression 
in Eq.(5). 

end for 
end while 


We assume that the manifold T is sampled using a 
regular grid; let % be the sampling of T, and U be 
the discrete vector that approximates u at the nodes. 
The structure of Fast Marching is almost identical to 
Dijkstra’s algorithm for computing shortest paths on 
graphs [B]. The main difference lies in the update 
step, which bypasses the constraint of propagation 
along edges. For a given node T, define Af(x) to be 
the set of neighbours of T (see illustration in Fig. 3). 
In the FM algorithm, each grid point is tagged either 
as Known (nodes for which distance is frozen), or 
Unknown (nodes for which distance can change in 
subsequent iterations). Initially, the grid points are 
set to Unknown , and U is set to °o, except U ( e ) that 
is set to zero. At each iteration of FM, the unknown 
node Tmin with smallest U is selected, and tagged as 
Known. Then, each unknown neighbour T G N(x m \ n ) 
is visited, and U( t) is updated as follows: U( t) is set 
to be the minimum of itself, U ( Tmin ) + || T — T m i n ||G T 
and 




Figure 3: Schematic representa¬ 

tion of the discretized manifold %, 
and the Fast Marching update rule. 
In this figure, we have J\f(x) = 

{ X 5 ^min 7 •> Xb } • 


min tU (Tmin) + (1 -t)U(x) + 11 1 T mi n H" (1 t^X T 11 g t 7 (5) 

te[ 0,1] 

for each known f such that (t, Tmm,^) forms a triangle (see Fig. 3). It is worth noting 
that, unlike Dijkstra, FM seeks the optimal point (possibly outside the set %) on the neigh¬ 
bourhood boundary that minimizes the estimated distance at T, under a linear approximation 
assumption (Eq. 5). Fortunately, the problem in Eq. (5) can be solved in closed form, as it 
corresponds to the minimization of a scalar quadratic equation [IZB]. 

The Manitest method, which applies FM algorithm to compute Aq r (/;/), is given in Al¬ 
gorithm 1 in the two dimensional case. The algorithm is stopped whenever a transformation 
that changes the classification label is found. 3 The nodes and metrics are generated on-the- 
fly in order to avoid spending unnecessary ressources on far-away nodes that might be farther 
than the minimal transformation that satisfies /(/) ^ /(/ T ) and therefore never visited. 

3 To ensure the termination of the algorithm (even if no successful transformation is found) we limit the num¬ 
ber of iterations N to 50,000. However, in all our experiments, this limit was never reached, and the algorithm 
terminated by successfully finding a transformation that satisfies f(I z ) ± /(/). 
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The complexity of Manitest is O (A log (A)), where N is the number of visited nodes if 
a min-heap structure is used [EE] (for constant p , and constant cost for evaluation of /). It 
is important to note however that the complexity of the algorithm has an exponential de¬ 
pendence on the dimension p since our method involves the enumeration of simplices in 
dimension p\ this is however not a big limitation as our main focus goes to low-dimensional 
transformation groups (e.g., p <6 for affine transformations). 

Finally, we note that when the metric is isotropic (i.e., G T is proportional to the identity 
matrix for all t), FM provides a consistent scheme. That is, as the discretization step tends 
to zero, the solution computed by the algorithm tends towards the viscosity solution of the 
Eikonal equation. Unfortunately, for arbitrary anisotropic metrics, consistency is however 
not guaranteed, and the exact computation of the geodesics becomes much more difficult and 
computationally demanding (see [□, E3, E3, El]). However, we observed that the anisotropy 
of the considered metric is generally not very large in the vicinity of e (although it exceeds 
the theoretical limit of guaranteed consistency). This leads to empirically accurate estimates 
of the geodesic distance using Manitest, when the discretization step is sufficiently small. 
Finally, we stress that that all previous methods addressing the metric anisotropy can readily 
be applied to our setting, and we leave that as future work. 

4 Experiments 

We propose now a set of experiments to study the invariance of classifiers in different set¬ 
tings. In particular, we consider the following transformation groups: 

• Ttrans • in-plane translations of the image (p = 2 ), 

• 7 dii+rot- dilations and rotations around the center of the image (p = 2), 

• 7sim- similarity transformations that describe combinations of translations, dilations 
and rotations around the center of the image (p = 4). 

In all experiments, we used a discretization step of 0.5 pixels for translations, ;r/20 
radians for rotation, and 0.1 for dilation for Manitest. Finally, the transformed images have 
the same size as the original image, and we use a zero-padding boundary condition. 

4.1 Handwritten digits dataset 

We first compare the invariance of different classifiers on the MNIST handwritten digits 
dataset Dzm. We consider the following classifiers: 

1. Linear SVM [DU], 

2. SVM with RBF kernel [E], 

3. Convolutional Neural Network [E3]: we employ a baseline architecture with two 
hidden layers containing each a convolution operation (5x5 filters with 32 feature 
maps for the first layer and 64 for the second layer), a rectified linear unit nonlinearity, 
and a max pooling over 2x2 windows followed by a subsampling. The architecture 
is trained with stochastic gradient descent, with a softmax loss. 

4. Scattering transform followed by a generative PCA classifier. We used the same 
settings as in [□], and we refer to that paper for more details. 

Table 1 reports the performance of the different classifiers under study, and their in¬ 
variance scores pr(f) using Manitest. As expected, the linear and RBF-SVM classifiers 
compare poorly to other classifiers in terms of invariance. This is due to the construction of 
the CNN and Scat. PCA, which explicitly take into account the invariance through pooling 
operations, while others do not. Moreover, it can be noted that Scat. PCA outperforms CNN 
in terms of robustness to translations, and global similarity transformations, even if the two 
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Group 

L-SVM 

RBF-SVM 

CNN 

Scat. PCA 

Test error (%) 

8.4 

1.4 

0.7 

0.8 

Translations (T = 7trans) 

0.8 

1.3 

1.7 

2.1 

Dilations + Rotations (T = 7dii+rot) 

0.8 

1.5 

1.9 

1.8 

Similarity (T = 7^ m ) 

0.6 

1.1 

1.5 

1.6 


Table 1: Accuracy and invariance scores of different classifers on the MNIST dataset. 



- 1.5 -1 - 0.5 0 0.5 1 1.5 


Rotation 

(a) (b) 

Figure 4: Distance map with 7dii+rot group (a), and correctly classified regions (b), for the 
four tested classifiers on an example image of digit “4”. Geodesic paths are also shown. 


classifiers have similar test error. This result is in agreement with the theoretical evidence 
[B, El] showing that scattering classifiers are invariant to deformations. 


To further get an insight on the invariance of the 
classifiers, we focus on the two-dimensional group 
7dii+rob and show in Fig. 4 (a) the geodesic distance 
map for an example image of digit “4” computed 
starting from the identity transformation (shown by 
a red dot at the center). Moreover, we overlay the 
minimally transformed images that change the labels 
of each of the classifiers, along with the correspond¬ 
ing geodesic paths. On this example, the Scat. PC A 
classifier is the most robust: a large dilation, accom¬ 
panied with a rotation is required to change the clas¬ 
sification label. In contrast, the linear SVM is easily 
“fooled” with a slight dilation. In Fig. 4 (b) we illus¬ 
trate in white the region of the Rotation-Scale plane, 
where the classifier outputs the correct label “4”. In¬ 
terestingly, the CNN and Scat. PCA classifiers are 
largely invariant to dilations (indicated by the verti¬ 
cal shape of the white region), while being moder¬ 
ately robust to rotations. 



Figure 5: Invariance score versus 
number of additional training sam¬ 
ples, for MNIST, with T = T s { m . 


In vision tasks, it is common practice to augment the training data with artificial exam¬ 
ples obtained by slightly distorting the original examples to achieve invariance. Although 
this practice is known to improve the classification performance of the classifiers on many 
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tasks, its effect on the invariance of the classifier is not quantitatively understood. Fig. 5 
illustrates the Manitest invariance scores for L-SVM and RBF-SVM classifiers trained on 
augmented training sets obtained by randomly generating transformations 4 from the simi¬ 
larity group 7sim? on the MNIST dataset. Both classifiers improve their invariance score as 
more transformed samples are added to the training set. This result has moreover an ele¬ 
ment of surprise, as RBF-SVM succeeds in improving its invariance score by around 50% 
with mere additions of artificial examples in the training set, and outperforms the invariance 
of CNN (without data augmentation). Moreover, the obtained score is comparable to Scat. 
PC A classifier, which is carefully designed to satisfy invariance properties. This experiment 
permits to characterize the actual power of data augmentation for learning the invariance 
from the data. 

4.2 Natural images 

In this second experimental section, we perform experiments on the CIFAR-10 dataset [O]. 
We focus on baseline CNN classifiers, and learn architectures with 1, 2 and 3 hidden layers. 
Specifically, each layer consists of a successive combination of convolutional, rectified linear 
units and pooling operations. The convolutional layers consist of 5 x 5 filters with respec¬ 
tively 32,32 and 64 feature maps for each layer, and the pooling operations are done on a 
window of size 3x3 with a stride parameter of 2. We build the three architectures gradually, 
by successively stacking a new hidden layer on top of the previous architecture (kept fixed). 
The last hidden layer is then connected to a fully connected layer, and the softmax loss is 
used. Moreover, the different architectures are trained with stochastic gradient descent. On 
the test set, the error of the three architectures are respectively 35.6%, 25.0% and 22.7%. 



(a) Translations (b) Dilation + Rotation (c) Similarity 

Figure 6: Invariance scores of CNNs on 7trans> Tdii+rot and %\ m , for the CIFAR-10 dataset. 

We show in Fig. 6 the Manitest invariance scores of the three architectures. Our ap¬ 
proach captures the increasing invariance with the number of layers of the network, for the 
three groups under study. This result is in agreement with empirical studies and previous 
known belief [ffl, O] that invariance increases with the depth of the network. However, while 
previous results were measuring the invariance with respect to a one dimensional transfor¬ 
mation group (e.g., rotation only), Manitest provides a systematic and principled way of 
verifying the increased invariance of CNNs with depth on more complex Lie groups (e.g., 
similarity transformations). Interestingly enough, it should be noted that despite the rela¬ 
tively small difference in performance between the two and three layers architectures, the 
invariance score strongly increases. This highlights again that invariance and performance 
measures capture two different properties of classifiers. 

4 Random transformations are constrained as follows: translation of at most 3 pixels in each direction, a scaling 
parameter between 0.7, and 1.3, and a rotation of at most 0.2 radians. 

































10 


FAWZI, FROSSARD: MANITEST: ARE CLASSIFIERS REALLY INVARIANT? 



(a) Worst 20 


(b) Average 20 



(c) Top 20 

Figure 7: Illustration of images having (a) worst, (b) average, (c) top invariance to similarity 
transformations (i.e., T = %\ m ), for the three-layer CNN. The odd rows show the original 
images, and the even rows show the minimally transformed images changing the prediction 
of the CNN. The Manitest invariance score Aq -(/;/) is indicated on each transformed image. 
All original images are correctly classified by the 3-layer CNN. 


Compared to the handwritten digits task, note that the Manitest scores obtained on the 
CIFAR task are generally much smaller, which suggests that it is harder to achieve invari¬ 
ance on this task. To visualize the level of invariance of the 3-layer CNN on the CIFAR-10 
dataset, we show in Fig. 7 sorted example images. For images with an average invari¬ 
ance score or less, note that the distinction between the transformed and original images are 
hardly perceptible. This suggests that the CNN is not robust to combinations of translations, 
rotation and dilation, even if it achieves a high accuracy. On the other hand, the difference 
between the original and the minimally transformed images are clearly perceptible for the 
top-scored images, even though a human observer is likely to correctly recognize the class 
of the transformed images. 

5 Conclusion 

In this paper, we proposed a systematic and rigorous approach for measuring the invariance 
of any classifier to low-dimensional transformation groups. Using a manifold perspective, we 
were able to convert the problem of assessing the classifier’s invariance to that of computing 
geodesic distances. Using Manitest, we quantified the increasing invariance of CNNs with 
depth, and highlighted the importance of data augmentation for learning invariance from 
data. We believe Manitest will be used to perform an in-depth empirical analysis of different 
classification architectures, in order to have a better understanding of the building blocks that 
best preserve invariance, and potentially build more robust classifiers. 
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